An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
Authors
Abstract:
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial Intelligence (AI) and optimization algorithms which are highly potential in Feature Selection (FS) and words extraction. In this paper Crow Search Algorithm (CSA) is used for FS and K-Nearest Neighbor (KNN) for classification. Additionally, TF technique is proposed for counting words and calculating the words’ frequency. Analysis is performed on Reuters-21578, Webkb and Cade 12 datasets. The results indicate that the proposed model is more accurate in classification than KNN model and, show greater F-Measure compared to KNN and C4.5. Moreover, by using FS, the proposed model promotes classification accuracy by %27, compared to KNN.
similar resources
An Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
full textAn Improved Flower Pollination Algorithm with AdaBoost Algorithm for Feature Selection in Text Documents Classification
In recent years, production of text documents has seen an exponential growth, which is the reason why their proper classification seems necessary for better access. One of the main problems of classifying text documents is working in high-dimensional feature space. Feature Selection (FS) is one of the ways to reduce the number of text attributes. So, working with a great bulk of the feature spa...
full textAn Improved k-Nearest Neighbor Algorithm for Text Categorization
k is the most important parameter in a text categorization system based on k-Nearest Neighbor algorithm (kNN).In the classification process, k nearest documents to the test one in the training set are determined firstly. Then, the predication can be made according to the category distribution among these k nearest neighbors. Generally speaking, the class distribution in the training set is unev...
full textAn Improved k-Nearest Neighbor Classification Using Genetic Algorithm
k-Nearest Neighbor (KNN) is one of the most popular algorithms for pattern recognition. Many researchers have found that the KNN algorithm accomplishes very good performance in their experiments on different data sets. The traditional KNN text classification algorithm has three limitations: (i) calculation complexity due to the usage of all the training samples for classification, (ii) the perf...
full textAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
full textMy Resources
Journal title
volume 9 issue 2
pages 37- 48
publication date 2018-05-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023